Bioinformatics analysis of various signal peptides for periplasmic expression of parathyroid hormone in E.coli

Hypoparathyroidism is a rare endocrine disease which is characterized by the deficiency of serum calcium levels. RhPTH is prescribed as a therapy for the management of refractory hypoparathyroidism. The aim of this study is to investigate 32 signal peptides of gram-negative bacterial origin and evaluate their potential for efficient secretion of recombinant human PTH (1–84)In E.coli to obtain higher expression of recombinant PTH in bacterial systems by using this fusion partner. SignalP and ProtParam servers were employed to predict the presence and location of signal peptide cleavage sites in protein sequence and computation of various physical and chemical parameters of protein respectively. Also, SOLpro server was applied for prediction of the protein solubility. Then ProtComp and SecretomeP online servers were employed to determine protein location. The evaluations showed that theoretically two signal peptides Lipopolysaccharide export system protein LptA (lptA) and Periplasmic pH-dependent serine endoprotease DegQ (degQ) are the most appropriate signal peptides examined. Due to the lack of post-translational modification in PTH, its periplasmic expression has preferences. Based on the results of this study, using bioinformatics and reliable servers signal peptides with appropriate secretory potential can be obtained which lead to the highest expression level.


Introduction
Parathyroid hormone (PTH) is an endocrine hormone, which is secreted by the chief cells of the parathyroid glands that is essential in bone remodeling. PTH is secreted in response to low blood serum calcium levels in the bloodstream. Parathyroid hormone indirectly increases bone resorption through osteoclast activity within the human bone marrow, which causes it to release more ionic calcium into the blood to elevate serum calcium levels. The production of 1,25(OH) 2 D in the kidney is tightly controlled, increased by parathyroid hormone. PTH is a polypeptide containing 84 amino acids and its half-life is estimated to be 4 minutes. Moreover, it has a molecular mass of almost 9.5 kDa.
PTH is a single-stranded polypeptide chain that is free of disulfide bonding and glycosylation sites. Hypoparathyroidism is a rare endocrine disease which is characterized by hypocalcaemia because of inappropriately low or absent levels of parathyroid hormone [1,2]. Hypoparathyroidism is evaluated to occur after almost The commercial success of any recombinant protein depends on its potential for large scale production and lower costs of the final product. Thus the cost of production of a recombinant product can be reduced by enhancing the efficiency of large scale production [8].
Overexpression of lots of proteins in E.coli host results in a mass of insoluble protein as inclusion bodies, where the efficiency of production of active proteins is very low, biologically, and purification methods are often expensive and laborious. A replacement method of preventing the formation of inclusion bodies is secretion of proteins to bacterial periplasmic space using appropriate signal peptides [9,10]. By directing the protein of interest into the periplasmic space, the following processes of protein separation become facilitated, thereby reducing the downstream costs. In bacteria, translocation of proteins across the cytoplasmic membrane towards the periplasmic space is usually mediated by the interaction of their N-terminal signal peptides with the protein secretion machinery. In order to obtain a higher output from the recombinant protein of interest, one can add signal peptide to the protein of interest, artificially. Immediately after translocation, signal peptides are proteolytically removed by signal peptidase at a very specific site.
Signal peptides have three main functional and structural regions, counting N-base terminal region, hydrophobic region or h, and highly polarized C-terminal region. Recognition of signal peptides and correct cutting sites are extremely important for obtaining a proper function of the protein. To transfer proteins from cytosols into the periplasmic space , E.coli has different systems [11]. Transference from the cytoplasm to periplasm occurs through arginine hybrid transportation pathway (Tat) or sec-dependent pathway [12].
The aim of this study is to investigate 32 signal peptides with a gram-negative bacterial origin and evaluate their potential for efficient secretion of recombinant human PTH(1-84) in E.coli to obtain higher expression of recombinant PTH in bacterial systems by using this fusion partner.

Investigation of signal peptides sequence
The amino acid sequence of 32 signal peptides with a gram-negative bacterial origin was obtained from UniProtKB database from ExPASy server. The collected sequences were used to achieve suitable signal peptides to enhance the value and accuracy of periplasmic expression of recombinant PTH in E.coli.
Prediction for the location of the cleavage site of the signal peptides SignalPserver 4.1(http://www.cbs.dtu.dk/services/SignalP/) was applied to predict the presence and location of signal peptide cleavage sites in protein sequence. The accuracy in detecting signal peptides has been estimated to be around 87% [13]. SignalP server 3 was employed to study the three regions of n, h, and c of signal peptides.

Computation of physiochemical characteristics of signal peptides
Protparam (http://web.expasy.org/protparam/) is a tool which allows the computation of various physical and chemical parameters for entered protein sequence. The computed parameters include the molecular weight, theoretical pI, estimated half-life, instability index, aliphatic index and grand average of hydropathicity (GRAVY), so protparam was employed for evaluation of different physiochemical characteristics [14].

Prediction of protein solubility
SOLpro server (http://scratch.proteomics.ics.uci.edu/) is an accurate sequence-based prediction of the protein solubility with an overall accuracy of over 74%, was used to predict protein solubility after overexpression in E.coli [15,16].

Prediction of secreted protein location:
Two servers were used in the investigation of protein location after secretion. Protcomp server 9 (http://www.softberry.com/ berry) and SecretomeP 2.0 (http://www.cbs.dtu.dk/services/ SecretomeP) used lots of different servers to determine protein location based on protein characteristics [13,17].

Isolation of signal peptides
In order to find signal peptides with secretion ability and other desirable features, the amino acid sequence of 32 secretory signal peptides with a gram-negative bacterial origin was investigated. The sequences with confirmed secretion potential, after extraction from the UniProt database (UniProtKB) linked to the parathyroid hormone sequence, were compared in several stages to obtain more secretion ability of PTH in E. coli (Table 1).

Prediction of signal peptides cleavage site
Appropriate signal peptides linked to PTH in E. coli were predicted by SignalP-4.1, one of the commonly used and reliable servers for the prediction of the signal peptides cleavage site. Signal peptides with the ability to cleave  (9) 15-20 (6) No 0.302 6 ushA 1-6(6) 7-18 (12) 19-25 (7) ALA-MS 0.726 7 pagP 1-6(6) 7-18 (12) 19-25 (7 at the cut-off site can be detected, based on the results of this server. The potential signal peptide was predicted using D-score discriminating score, termed D-score with default cutoff value of 0.5 is used as a basis for predicting the cleavage potential of sequences (Table2).

In silico analysis of physicochemical characteristics of signal peptides
The physicochemical characteristics of PTH protein linked to the 32 signal peptides were evaluated based on the amino acid sequence using the ProtParam server. Structural and functional characteristics such as signal peptide length, molecular weight, PI, GRAVY, aliphatic index, and instability index were examined for the desired protein containing the studied signal peptides ( Table 3).

Prediction of protein solubility
Nowadays, of the various servers applied to predict protein solubility, one of the most accurate and reliable servers is SOLpro. The solubility of the PTH in connection with various signal peptides and solubility percent for each structure was evaluated as shown in Table 3. The results showed that the PTH is insoluble when linked to all used signal peptides.
As a result, considering accomplished analysis to identify and rank the most appropriate signal peptides linked to PTH, 22 signal peptides were compared based on the mentioned parameters and then they were ordered using the aliphatic index. Finally, 12 signal peptides with an aliphatic index of more than 95 were selected for further analysis ( Table 4).

Prediction of the secretory protein location
Prediction of protein location is a vital step in drug designing. In this study, the ProtCompB server version 9 was employed to predict the final cellular location of PTH in E. coli. So, several logical algorithms were applied to determine different cellular location.
In this study, the secretory ability of the final signal peptides was investigated using the SecretomeP server. The final version of this server predicts the secretory ability of proteins by comparing their biochemical properties with cytoplasmic proteins. This server presents the prediction results as SecP score, and the threshold score for bacterial signal peptides is 0.5. Ten of the final signal peptides failed to obtain the minimum score for optimal PTH secretion and only two of them earned a score of more than 0.5. This score showed that these two final signal peptides have a high secretion potential rather than other signal peptides.
Based on the results of this study, which were obtained by evaluating several important parameters involved in the secretory expression, lptA and degQ signal peptides have greater potential for favorable secretory expression of PTH in bacterial systems.

Discussion
Development of a new drug is a very complicated process that requires time and money. Nowadays, predictive methods are used as powerful and growingly important tools to save time and money in the process of new drugs development. Hence, several studies have used bioinformatics methods that increase the precision and accuracy of empirical studies, in addition to the above-mentioned benefits. Bioinformatics is an interdisciplinary field that is widely used in the processing of biological information through computational methods and making a link between different sciences such as computer and statistics [18,19]. Overexpression of recombinant proteins in E. coli can lead to improper folding   and accumulation of them in aggregated and insoluble intracellular bodies, named inclusion bodies [20,21]. By adding a suitable signal peptide to the N-terminus of the desired protein, it can be directed to the periplasmic space and improper folding at the overexpression of recombinant proteins in E. coli can be resolved [22,23]. To achieve an appropriate level of recombinant proteins secretion into the periplasmic space, the directed signal peptide of the desired protein should have proper features. In this regard, there are various computational tools to predict and identify a variety of physicochemical properties of a signal peptide, such as molecular weight, isoelectric pH, GRAVY, and aliphatic index. The signal peptides selected for obtaining more levels of PTH secretion in E. coli were compared with each other. To achieve reliable results that can be generalized, it was avoided to use hypothetical signal peptides as far as possible. Appropriate signal peptides for the periplasmic expression of PTH gene were extracted from UniProtKB, one of the most reputable databases used in raw data extraction studies. Another reason for the use of UniProtKB is that it provides a set of functional data about proteins with accurate explanations and its users continuously collect, process, update, and display data in order to improve the availability and usability of these data [24,25]. As a result, the access number of this reputable database was used for addressing the amino acid sequences of signal peptides.
Based on the results of the SignalP server, 10 signal peptides earned a D-score of lower than 0.5, which were excluded from the study because of the inability to be cleaved. In addition, 22 signal peptides exhibited the right cleavage site and proper cleavage potential that were used for further investigations.
Signal peptides are composed of three regions, namely H, N, and C. The N-terminus often contains positivelycharged amino acids, H Region is the intermediate hydrophobic region that contains at least six amino acids, and the C-terminus consists of polarized uncharged amino acids and a signal peptide cleavage site. Given the impossibility of determining these three regions in the SignalP-4.1 server, the SignalP-3 was used to determine and study the triple areas in signal peptides (Tables 1 and  2). Although the triple regions of signal peptides do not function independently, the length of H Region is one of the most important parameters in protein secretion [26].
In an in silico study on 48 human signal peptides fused with the human growth hormone for periplasmic secretion, Douzandeh et al. also used the SignalP server to predict the signal peptides cleavage site and showed that only 17 signal peptides were eligible for further analysis [27].
The length of all signal peptides originated from gramnegative bacteria and ranged between 17 and 34 amino acids. Regularly, a signal peptide sequence in gramnegative bacteria contains 18-30 amino acids. Therefore, it can be stated that the studied signal peptides are in a proper range [28]. PI was calculated based on pKa values of amino acids. Since the pKa value of each amino acid depends on its side chain, PI plays a major role in determining the pH-dependent properties of a protein. In addition, protein GRAVY, the sum of hydropathy value of all amino acids divided by the length of the sequence, was calculated for all signal peptides.
Previous findings have shown that the increase in hydrophobicity levels and the length of H Region leads to the protein secretion improvement [29]. The aliphatic index indicates the hydrophobicity level of the studied sequence, which is defined as the relative volume occupied by the aliphatic side chains of alanine, leucine, isoleucine, and valine in the desired amino acid sequence. Calculated based on the sum of the mole fraction of above-mentioned amino acids, the aliphatic index can be regarded as a positive factor for the increase of thermostability of proteins, as it is significantly higher in thermophilic bacteria proteins than non-thermophilic ones [30]. The instability index, which estimates the stability of proteins in a normal state and in accordance with their initial structure, was also calculated for selected sequences (Table 3). Proteins with an instability index under and over 40 are predicted to be stable and unstable, respectively [31].
The hydrophobicity level of different signal peptides can be studied using parameters such as the aliphatic index, GRAVY, and the length of H Region. Therefore, signal peptides can be compared based on their hydrophobicity level by sorting the relevant data, besides another important index like the D-score of each sequence. In a study conducted by Douzandeh et al. on hGH, the same server was used to investigate various physicochemical characteristics of signal peptides.
Protein accumulation and folding in the cell environment are contradictory, as the overexpression of unfolded proteins leads to their accumulation and generation of inclusion bodies. Protein insolubility is a major barrier to the commercial production of proteinbased pharmaceuticals [32].
Solubility is one of the major characteristics of every protein, which is directly associated with its amino acids sequences, and one of the important factors in the production, formulation, and preparation of pharmaceutical recombinant proteins, because subsequent measurements should be made on soluble proteins [33]. Intrinsic factors such as amino acids of the protein surface, along with nonintrinsic factors such as pH, ionic strength, and temperature, which can be measured in empirical observations, influence protein solubility [34]. With the increase in the use of proteins as pharmaceuticals and the need to express a large amount of them, the prediction of protein solubility has increasingly become important [33].
In most cases, the expression of heterologous proteins in bacteria leads to the generation of insoluble proteins and inclusion bodies [32]. The results of this stage were consistent with the findings obtained from similar studies in relation to the analysis of signal peptides fused with the human growth hormone (hGH), in which all signal peptides fused with hGH protein were predicted to be insoluble. For example, Douzandeh et al. used this server to evaluate the solubility of hGH protein in E. coli.
In a gram-negative bacterial host, the protein is predicted to be in one of the cytoplasmic, membrane (internal and external), secretory, and periplasmic areas. This server uses several methods to predict the protein placement, including neural network-based prediction, comparison to homologous proteins with a known location, and prediction of the desired peptide sequence performance.
The results indicated that 9 out of the 12 studied signal peptides have a great potential for PTH secretion into the periplasmic space, whereas 2 of them do not have the ability to cross the bacterial membrane and the last signal peptide secretes the protein out of the bacterium.
Ten of the final signal peptides failed to obtain the minimum score for optimal PTH secretion and only 2 of them earned a score over 0.5. This indicates the high secretion ability of these peptides compared to others.

Conclusion
Empirical studies conducted on the periplasmic production of recombinant hormones in E. coli using a signal peptide derived from a gram-negative bacterium have reported different results in terms of efficiency. In addition, because of the need to spend a large amount of time and money, such studies are not cost-effective. Nowadays, several signal peptides fused with PTH can be investigated using bioinformatic studies, which both save time and cost and increase accuracy. The results of this study, obtained from several reputable servers in multiple stages, showed that signal peptides theoretically have the ability to periplasmically express this protein in E. coli. Moreover, based on the multistage evaluation of this study and the proven validity of bioinformatic surveys in the analysis of signal peptides, these results can be used empirically and also can influence other studies on the periplasmic expression of a variety of protein-based pharmaceuticals in E. coli. Based on the study findings, lptA and degQ signal peptides, in the case of fusion with human PTH, theoretically have greater potential for periplasmic expression than other studied signal peptides. Further studies in this area can lead to the confirmation of the results of the present research as well as the use of them in future empirical experiments.